unsupervised environment design
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
A wide range of reinforcement learning (RL) problems --- including robustness, transfer learning, unsupervised RL, and emergent complexity --- require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary. The adversary is motivated to generate environments which maximize regret, defined as the difference between the protagonist and antagonist agent's return. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
Cho, Geonwoo, Im, Jaegyun, Lee, Jihwan, Yi, Hojun, Kim, Sejin, Kim, Sundong
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > New Jersey (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Robots (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Improving Environment Novelty Quantification for Effective Unsupervised Environment Design
Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to guide curriculum design. Regret-driven methods generate curricula that progressively increase environment complexity for the student but overlook environment novelty -- a critical element for enhancing an agent's generalizability. Measuring environment novelty is especially challenging due to the underspecified nature of environment parameters in UED, and existing approaches face significant limitations.
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
A wide range of reinforcement learning (RL) problems --- including robustness, transfer learning, unsupervised RL, and emergent complexity --- require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary.
Review for NeurIPS paper: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Additional Feedback: Could you add a specific example/ problem that would be easily solved by defining it as a UED? I think it would help the paper in general. The agent for the Lava environment, that replaces walls with dangerous lava, is trained from generated maps with lava instead of walls? Additionally, the paper needs further proof reading, some minor mistakes I found: *Line 511: I wouldn't start a proof section saying "it would be nice to know that..." that is too informal *Line 512: "their" should be "its" *Line 40: Section?, i.e., referenced section is missing the number *Line 138: the function T M shouldn't be defined on S M? *Line 171: This sentence needs further explanation *Line 207: based on twice *Line 209: Figure? The Broader Impact section, specially the first paragraph is too speculative, automating jobs or automated weapons are general problems of the AI field, it should focus more on the impact of this specific work.
Review for NeurIPS paper: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
This paper pursues a significant line of enquiry regarding an important topic: automatic, unsupervised environment design. The paper makes algorithmic, theoretical, and empirical contributions. While the reviewers had some concerns about the clarity of the theory and the adequacy of the empirical results, these have been well addressed in the rebuttal. The authors are strongly urged to incorporate all the reviewers' feedback in the final version.
Grounding Aleatoric Uncertainty for Unsupervised Environment Design
Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution. We formalize this phenomenon as curriculum-induced covariate shift (CICS), and describe how its occurrence in aleatoric parameters can lead to suboptimal policies. Directly sampling these parameters from the ground-truth distribution avoids the issue, but thwarts curriculum learning. We propose SAMPLR, a minimax regret UED method that optimizes the ground-truth utility function, even when the underlying training data is biased due to CICS.
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
A wide range of reinforcement learning (RL) problems --- including robustness, transfer learning, unsupervised RL, and emergent complexity --- require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary.